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SUMMARY 


RNA sequences encoding the surface projection (spike) of the coronavirus infectious 
bronchitis virus, strain Beaudette, have been cloned into pBR322 using cDNA primed 
with a specific oligonucleotide. A 5-3 kilobase viral insert in the clone pMB179 has 
been identified. The region of this clone coding for the spike gene has been sequenced 
by the chain termination method, and we present here the first report of DNA 
sequence data for a coronavirus spike protein, the protein which forms the 
characteristic ‘corona’ after which the group is named. The amino acid sequence of the 
primary translation product, deduced from the DNA sequence, predicts a polypeptide 
of 1162 amino acids with a molecular weight of 127006. This has many interesting 
features which confirm and extend our knowledge of this recently characterized 
membrane glycoprotein. The polypeptide is subsequently cleaved to S1 and 82, and 
partial amino acid analysis of the amino-terminus of the S1 polypeptide has been 
employed to locate the position of this terminus of $1 within the large open reading 
frame. The amino acid analysis also reveals the presence of an 18 amino acid putative 
signal sequence on the primary translation product which is not present on the mature 
S1 polypeptide. 


INTRODUCTION 


Infectious bronchitis virus (IBV) causes respiratory disease in the fowl and is of considerable 
economic importance to the poultry industry. The type species of the Coronaviridae, it possesses 
a single-stranded RNA genome, approximately 20 kb in length, of positive polarity which 
specifies the production of three major structural proteins: nucleocapsid protein, membrane 
glycoprotein, and spike glycoprotein. The spike protein, encoded by mRNA E, has recently 
been characterized (Cavanagh, 1983a,b,c) as comprising two or three copies each of two 
glycopolypeptides, $1 (90000 mol. wt.) and S2 (84.000 mol. wt.). The polypeptide components of 
the glycopolypeptides $1 and S2 have been estimated after enzymic removal of oligosaccharides 
to have molecular weights of 64000 and 61000 (Cavanagh, 19834). It appears that the spike 
protein is attached to the viral membrane by $2 (Cavanagh, 1983c). A neutralizing and 
haemagglutination-inhibiting monoclonal antibody produced against the spike protein binds 
the S1 glycopolypeptide, an effect which is strain-specific (Mockett et al., 1984). 

The organization of the IBV genome and subgenomic mRNAs has been studied in detail 
(Stern & Kennedy, 1980a,b; Stern & Sefton, 1984; Brown et al., 1984; Brown & Boursnell, 1984) 
and is summarized in Fig. 1. Using oligo(dT)-primed cDNA synthesis we have previously 
isolated 3-3kb of overlapping cDNA clones extending from the 3’ poly(A) tract (Brown & 
Boursnell, 1984). We report here the use of a specific oligonucleotide to prime cDNA synthesis, 
which has allowed the isolation of a 5-3 kb viral insert containing the spike gene of IBV. The 
region of this clone containing the spike gene has been completely sequenced on both strands. 
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METHODS 


cDNA cloning. The isolation of IBV strain Beaudette virion RNA has been described previously (Brown & 
Boursnell, 1984) as has the synthesis, by the phosphotriester method, of the specific oligonucleotide primer used to 
prime reverse transcription (Gait et a/., 1982; Boursnell et a/., 1984). CDNA synthesis was carried out using the 
method of Gubler & Hoffman (1983) with approximately 20 ug of virion RNA in a final reaction volume of 50 ul. 
Double-stranded cDNA was tailed with dC residues and cloned into dG-tailed PstI-cleaved pBR322. This 
material was used to transform (Hanahan, 1983) Escherichia coli LE392 and selection made for tetracycline 
resistance. Clones containing viral inserts were identified by colony hybridization (Grunstein & Hogness, 1975) 
using polynucleotide kinase 3?P-labelled, alkali-treated IBV genomic RNA as a probe. The plasmid (pMBI179) 
which was isolated from the clone showing the strongest signal in the colony hybrid experiment was studied in 
more detail. 

Subcloning for M13 sequencing. Random subclones of pMBI79 were generated by cloning either DNase I 
(Anderson, 1981) or sonicated (Deininger, 1983) fragments into Smal-cut, phosphatase-treated M13mp10 
(Amersham). Clones containing viral inserts were identified by colony hybridization with kinase-labelled or 
reverse-transcribed viral probes. In addition, Pst] and Rsal fragments were cloned into Pst]-digested M13mp11 
and Smal-cut, phosphatase-treated M13mp10 respectively. 

DNA sequencing. M13/dideoxynucleotide sequencing (Sanger et a/., 1977) was carried out using [a-*5S]dATP 
(Amersham), the complete sequence being obtained on both strands. Reverse sequencing was used to obtain the 
last sequences required (Hong, 1981). The products of the sequencing reactions were analysed on buffer gradient 
gels (Biggin et a/., 1983). A sonic digitizer (Graf/Bar, Science Accessories Corporation) was used to read data into 
a BBC microcomputer, and data were analysed on a VAX 11/750, using the programs of Staden (1982, 1984). 

Isolation of S1 polypeptide and partial amino acid analysis. P\aque-purified IBV Beaudette was radiolabelled with 
[3H]serine (Amersham) in chick kidney cells (Stern et a/., 1982) and purified as described previously (Cavanagh, 
1981). Viral polypeptides were resolved by SDS—polyacrylamide gel electrophoresis in 5 to 10% gels which were 
fluorographed without fixation. The S! polypeptide was eluted from the gels by electrophoresis (Welch et al., 
1981), extensively dialysed against distilled water containing 0-03% SDS and lyophilized. The powdered protein 
was dissolved in 200 yl of 0-1 M-sodium bicarbonate containing 4% SDS and added to 100 mg of p- 
phenylenediisothiocyanate-treated glass (17 nm pore size) prepared by the method of Wachter et al. (1973). 
Following incubation for 90min at 56 °C under nitrogen the glass was washed with water and methanol to remove 
non-covalently bound material. The glass-coupled peptide was then sequenced by automated solid-phase Edman 
degradation (Brett & Findlay, 1983). 


RESULTS 
DNA sequence of the spike protein 


A 13 base oligonucleotide complementary to a sequence towards the 5’ end of clone C5.136 
(see Fig. 1) was used to prime cDNA synthesis from viral RNA. Clone pMB179 obtained from 
this experiment contained a 5-3 kilobase viral insert and in Southern blot analysis (Southern, 
1975) hybridized to a small clone pMB172 which had previously been shown to contain mRNA 
E sequences by Northern blot analysis (data not presented). The DNA sequence analysis 
indicated that the 3’ end of the clone was within 12 bases of the 5’ end of the oligonucleotide used 
to prime DNA synthesis. 3645 bases of sequence containing the gene encoding the spike 
precursor protein are presented in Fig. 2. It is of note that 50 bases upstream from the AUG 
initiation codon is a sequence, AACTGAACAAAA, which resembles the homology regions 
that we have identified on the genome at the positions corresponding to the 5’ ends of the bodies 
of mRNAs A, B and C (Brown & Boursnell, 1984; Boursnell et a/., 1984). This homology region 
maps approximately 8 kilobases from the 3’ end of the viral genome, which is in good agreement 
with the size estimates for mRNA E of 7-9 kb (Stern & Kennedy, 1980a) and 7:8 kb (Boursnell e¢ 
al., 1984) as measured by gel electrophoresis. A similar sequence, AACTGAACAATA, is 
present at the predicted 5’ end of the body of mRNA D and both sequences are underlined in 
Fig. 2. The sequence containing an open reading frame of 3486 bases and the primary amino 
acid sequence of the 127006 mol. wt. protein deduced from it are presented in Fig. 2 using the 
single-letter amino acid code (Commission on Biochemical Nomenclature, 1968). Spike 
precursor synthesis is initiated at the 5’-proximal AUG of mRNA E although the sequence 
GNNAUGU occurs rarely amongst functional eukaryotic initiator sequences (Kozak, 1983). 
The 3486 base pair open reading frame is followed by two UGA termination codons. 
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Fig. 1. Genomic organization of infectious bronchitis virus. The relationship between the 3’ co- 
terminal nested set of mRNAs and the viral genome is indicated, as are the coding regions for the 
structural proteins, spike (S), membrane (M) and nucleocapsid (N) specified by mRNAs E, C and A 
respectively. The ‘homology regions’ are sequences present in the genome at positions corresponding to 
the 5’ termini of the mRNA bodies and are thought to be involved in subgenomic RNA synthesis. The 
arrangement of the cDNA clones and the position of the primer used to generate clone pMB179 are 
shown. The sequence data presented in Fig. 2 are represented by the box below pMB179. 


Partial amino acid sequence analysis of S1 


To locate the position of the Sl polypeptide within the open reading frame, and to look for 
potential signal sequences, partial amino acid sequence analysis of the amino terminus of $1 was 
undertaken. The results indicated the presence of serine residues at positions 5, 6, 7, 14 and 20 in 
Sl. These results unambiguously identified the N-terminal amino acids of Si within the 
predicted sequence. The amino acid data indicated that an 18 amino acid signal sequence witha 
typical hydrophobic core and small neutral residues, alanine and cysteine, at positions — 1 and 
—3 from the cleavage site (Von Heijne, 1984), is cleaved from Si. The positions of the N- 
terminal amino acids of SI, and of the proposed signal sequence are shown in Fig. 2. 


Structural features of the IBV spike protein 


In addition to the presence of a signal sequence at the amino terminus of SI, two other 
interesting structural features of the spike precursor protein were revealed by analysis of the 
predicted amino acid sequence. Firstly, the sequence contains 28 potential sites for N- 
glycosylation (assuming that Asn—Pro-Thr and Asn-Pro-Ser are not used; Neuberger et al., 
1972) which are shown in Fig. 2 and 3. Secondly, a hydrophilicity plot (Kyte & Doolittle, 1982) of 
the amino acid sequence (see Fig. 3) shows the presence of a hydrophobic region which contains 
44 non-polar amino acids preceding charged amino acids at the carboxy-terminus of $2. This 
structure may anchor the spike protein to the viral envelope, as has been proposed for similar 
structures on human influenza virus and fowl plague virus haemagglutinins (Gething et al., 
1980; Porter et al., 1979). 
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Hydropathicity 
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Fig. 3. Hydropathicity profile of the predicted amino acid sequence of the spike polypeptide. Positive 
values indicate hydrophobic regions and negative values indicate hydrophilic regions. The midpoint 
line represents a grand average of the hydropathy of the amino acid compositions of a large number of 
sequenced proteins (Kyte & Doolittle, 1982). Each point on the graph represents the average 
hydropathy of a span of 19 residues. The putative signal and anchor sequences are shown, as are the 


approximate regions of the gene encoding S1 and S2. The circles below the plot show the positions of 
potential glycosylation sites. 


DISCUSSION 


The DNA sequence presented in Fig. 2 contains the complete unique region present in IBV 
mRNA E. This messenger RNA has been found to specify production of the spike precursor in a 
translation system in vitro (Stern & Sefton, 1984). The sequence predicts a primary translation 
product of 1162 amino acids with a molecular weight of 127006, which is close to that estimated 
for the polypeptide components of S] and S2. Translation of mRNA E in vitro had indicated that 
the non-glycosylated spike precursor had a molecular weight of 110000 (Stern & Seftan, 1984), 
and estimates of the combined molecular weight of S1 and S2 after the removal of 
oligosaccharides by endoglycosidase H were 115000 (Stern & Sefton, 1982) and 125000 
(Cavanagh, 1983c). In addition, partial amino acid sequence analysis of the amino terminus of 
S1 has unambiguously identified the position of $1 within the predicted primary translation 
product of the spike gene. 

The sequence presented has sequences AACTGAACAAAA towards the 5’ end and 
AACTGAACAATA towards the 3’ end (underlined in Fig. 2). Their high homology with 
sequences which have previously been found at the 5’ ends of the bodies of IBV mRNAs A, B 
and C, referred to in Fig. 1 as ‘homology regions’ (Brown & Boursnell, 1984; Boursnell et ai., 
1984) suggests that these sequences represent the position of the 5’ ends of the bodies of mRNAs 
E and D. This is confirmed by mRNA length measurements. It is interesting to note then that 
the coding sequences for the spike gene are not completely contained within the ‘unique’ region 
of mRNA E but extend for approximately 32 bases beyond the predicted 5’ terminus of the body 
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of mRNA D. A similar arrangement may be the case at the boundary of mRNAs A and B where 
an open reading frame predicting a 9500 mol. wt. polypeptide extends considerably intomRNA 
A (Boursnell & Brown, 1984). In both cases the homology regions appear to lie within coding 
regions and this may influence the exact sequence of these homology regions. The homology 
region at the 5’ end of mRNA D differs from that present at the 5’ end of mRNAs A, B and C in 
the presence of a G instead of a T (CTGAACAA rather than CTTAACAA) and 1t is interesting 
to note that the presence of a T would have generated an in-frame termination codon which 
would have eliminated nine amino acid residues, four of which are charged, from the carboxy 
terminus of the polypeptide. 

Analysis of the predicted amino acid sequence reveals three interesting structural features of 
the spike protein. Firstly the results demonstrate the presence of a typical hydrophobic signal 
sequence which is not present on the mature protein. This is commonly found in proteins which 
must pass through membranes, and is of interest because the other surface protein of IBV, the 
membrane protein, which is believed to span the membrane, does not undergo substantial post- 
translational processing and contains no obvious signal sequence (Boursnell et a/., 1984). It has 
been proposed in this case that an internal signal sequence may be present in the membrane 
protein. Secondly, 28 potential sites for N-linked glycosylation are present which reflects the 
very high level of glycosylation which this protein is known to undergo. It is probable that the 
majority of these sites are glycosylated in order to account for the approximately 50000 
difference in molecular weight observed between glycosylated and unglycosylated spike 
polypeptides. Mannose-rich viral glycoprotein carbohydrate side chains have molecular weights 
of approximately 2000 (Klenk & Rott, 1980). The third feature is a long stretch of non-polar 
amino acids close to the carboxy terminus of the S2 polypeptide which may serve as an anchor 
attaching the protein to the viral membrane. This agrees well with the observation (Cavanagh, 
1983c) that treatment of virions with urea resulted in the removal of SI but not $2. Similar 
‘anchor’ structures have been proposed for a number of viral proteins. 

The cloning and characterization of the spike gene of IBV has confirmed and extended 
previous observations on the surface glycoprotein of IBV. The availability of cloned spike 
sequences also represents an important step in attempts to develop a novel vaccine against IBV, 
as this viral component is thought to be involved in the induction of immunity against the 
disease. 
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